Error Tree: A Tree Structure for Hamming & Edit Distances & Wildcards Matching
نویسنده
چکیده
Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, Hamming and edit distances, as well as the wildcards matching problem. The input is a text of length n over a fixed alphabet of length Σ, a pattern of length m, and k. The output is to find all positions that have ≤ k Hamming distance, edit distance, or wildcards matching with P . The algorithm proposes for Hamming distance and wildcards matching a tree structure that needs O(n log Σ n k! ) words and takes O( m k! + occ)(O(m+ log Σ n k! + occ) in the average case) of query time for any online/offline pattern, where occ is the number of outputs. As well, a tree structure of O(2n log Σ n k! ) words and O( m k! + 3 occ)(O(m+ log Σ n k! + 3 occ) in the average case) query time for edit distance for any online/offline pattern.
منابع مشابه
Nikolaus Augsten Approximate Matching of Hierarchical Data
The goal of this thesis is to design, develop, and evaluate new methods for the approximate matching of hierarchical data represented as labeled trees. In approximate matching scenarios two items should be matched if they are similar. Computing the similarity between labeled trees is hard as in addition to the data values also the structure must be considered. A well-known measure for comparing...
متن کاملMatching and Embedding through Edit-Union of Trees
This paper investigates a technique to extend the tree edit distance framework to allow the simultaneous matching of multiple tree structures. This approach extends a previous result that showed the edit distance between two trees is completely determined by the maximum tree obtained from both tree with node removal operations only. In our approach we seek the minimum structure from which we ca...
متن کاملApproximate pattern matching under generalised edit distance and extensions to suffix array library
The approximate pattern matching problem is the problem of finding all occurences of a certain pattern in a usually much longer text allowing for a fixed error threshold in the matching. The problem has been studied extensively and many very good solutions were found. However, significant instances of the problem, namely those allowing for generalised edit-distance error functions, remain witho...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1506.04486 شماره
صفحات -
تاریخ انتشار 2015